A preselection method based on cost degradation from the optimal sequence for concatenative speech synthesis
نویسندگان
چکیده
A novel unit preselection criterion for concatenative speech synthesis is proposed. To reduce the computational cost for unit selection, units that are unlikely to be selected should be pruned as preselection before Viterbi search. Since the criterion is defined as the difference between the cost of the locally optimal sequence where a unit is fixed and that of the globally optimal sequence, not only the target cost but also the concatenation cost can be taken into account in preselection. For real-time speech synthesis, a preselection method using decision trees, where a unit can be bound to multiple nodes of a tree, is also introduced. Results of a unit selection experiment show that the proposed method using decision trees built from 8-hour training data is superior in the costs of the selected units to the conventional online preselection based on target costs. The experimental results also show that the method is more effective where the computational cost is strongly limited.
منابع مشابه
Utilization of an HMM-based feature generation module in 5 ms segment concatenative speech synthesis
, – Spectrum at each segment boundary for calculation of concatenation cost (2) Synthesis stage – Text-to-Feature •Generate features from input text (linguistic/prosodic-information) – Feature-to-Speech • Find the N-best candidates in each frame (preselection) according to segment's target cost • Find the best path from the N-best candidates based on concatenation cost •Concatenate the segments...
متن کاملOptimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations
This paper describes optimizing a cost function for segment selection in concatenative Text-to-Speech based on perceptual characteristics. We use the norm of a local cost for each segment as an integrated cost function for a segment sequence to consider both the degradation of naturalness over the entire synthetic speech and the local degradation. The cost function is optimized by adjusting not...
متن کاملImproving preselection in unit selection synthesis
Unit selection synthesis is a method of selecting and concatenating speech segments from a large single-speaker audio database to synthesize utterances. Selection is based on assigning target and concatenation costs to units and then finding a lowest cost sequence of units that will synthesize a given utterance. In order to synthesize efficiently, it is necessary to limit the number of units co...
متن کاملSegment selection considering local degradation of naturalness in concatenative speech synthesis
In this paper, we investigate the effect of using a novel cost, RMS (Root Mean Square) cost, for segment selection for concatenative Text-to-Speech. The RMS cost is affected not only by the total degradation of naturalness but also by the local degradation of naturalness. From the results of experiments comparing this approach with segment selection based on a conventional average cost, it is f...
متن کاملPerceptual Evaluation of Cost for Segment Selection in Concatenative Speech Synthesis
ABSTRACT In segment selection for concatenative Text-to-Speech (TTS), it is important to utilize a cost that corresponds to the perceptual characteristics. We clarify correspondence to the perceptual scores of the cost, and then various functions to integrate the costs are evaluated. The perceptual scores are determined from results of perceptual experiments on the naturalness of synthetic spee...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007